AI Safety Gridworlds

نویسندگان

  • Jan Leike
  • Miljan Martic
  • Victoria Krakovna
  • Pedro A. Ortega
  • Tom Everitt
  • Andrew Lefrancq
  • Laurent Orseau
  • Shane Legg
چکیده

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gridworlds as Testbeds for Planning with Incomplete Information

Gridworlds are popular testbeds for planning with incomplete information but not much is known about their properties. We study a fundamental planning problem, localization, to investigate whether gridworlds make good testbeds for planning with incomplete information. We find empirically that greedy planning methods that interleave planning and plan execution can localize robots very quickly on...

متن کامل

A Survay of Reinforcement Learning Methods in the Windy and Cliff-walking Gridworlds

This report details the implementation of three Reinforcment learning methods, Monte Carlo, SARSA, and Q-Learning, and compares their performances in the Windy and CliffWalking Gridworlds.

متن کامل

Efficient Incremental Search for Moving Target Search

Incremental search algorithms reuse information from previous searches to speed up the current search and are thus often able to find shortest paths for series of similar search problems faster than by solving each search problem independently from scratch. However, they do poorly on moving target search problems, where both the start and goal cells change over time. In this paper, we thus deve...

متن کامل

Subgoal Graphs for Eight-Neighbor Gridworlds

We propose a method for preprocessing an eightneighbor gridworld to generate a subgoal graph and a method for using this subgoal graph to find shortest paths faster than A*, by first finding high-level paths through subgoals and then shortest low-level paths between consecutive subgoals on the high-level path.

متن کامل

Robust Computer Algebra, Theorem Proving, and Oracle AI

In the context of superintelligent AI systems, the term “oracle” has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.09883  شماره 

صفحات  -

تاریخ انتشار 2017